# Optimizing the Photodetector/Analog Front-End Interface in Optical Communication Receivers 

Bahaa Radi ${ }^{\oplus}$, Member, IEEE, Zonghao Li ${ }^{\oplus}$, Member, IEEE, Dhruv Patel ${ }^{\oplus}$, Member, IEEE, and Anthony Chan Carusone ${ }^{\bullet}$, Fellow, IEEE


#### Abstract

This article addresses the optimization of the interface between the photodetector (PD) and the analog front-end in highspeed, high-density optical communication receivers. Specifically, the article focuses on optimizing design elements in the interface, including the interconnecting transmission line, the T-coil, the transimpedance amplifier (TIA), and digital equalization tap weights. To optimize the optical link, we use a combination of analytical models, electromagnetic simulations, and machine learning techniques to describe different interface elements as most appropriate for each. Finally, we use the genetic algorithm to obtain optimal design parameters. The proposed optimization approach leads to a quick design time and reveals insights into some of the best design practices. As an example, we use the proposed method to investigate the relationship between optimal transmission line width and the amount of equalization available on the receiver. These conclusions are further supported by measurements taken on an assembled prototype with various PD-to-TIA interconnect lengths.


Index Terms-Circuit noise, decision-feedback equalizer (DFE), feed-forward equalizer (FFE), machine learning (ML), optical receivers, pulse amplitude modulation, sensitivity.

## I. Introduction

TO SUPPORT the demand for the current 400 G and emerging 800 G and 1.6 T Ethernet standards in data centers, the per-lane data rate and the number of lanes have to be increased. Higher order modulation implementations, such as PAM-6 and PAM-8, are in active research to improve the per-lane data rate. Moreover, as the limited bandwidth of the analog front-end (AFE) has an increasingly detrimental effect on intersymbol interference (ISI) for higher order modulation schemes, equalization techniques are used to account for the limited bandwidth. On the other hand, increasing the number of lanes presents packaging challenges on the receiver side.

Many integrated CMOS optical receivers were developed on the receiver side, allowing the AFE and the SerDes circuits to coexist on one chip, such as the $100 \mathrm{~Gb} / \mathrm{s} 4$-PAM optical receiver in [1], the linear transimpedance amplifier (TIA) in 16 nm FinFET in [2], the linear TIA in 28 nm CMOS in [3], and the linear TIA copackaged with the photodiode in [4]. However,

[^0]

Fig. 1. Illustration of increased interconnect density leading to longer and potentially different interconnect lengths.
the photodetector (PD) remains a discrete component. Since silicon-based CMOS technologies are not optimized for efficient light absorption [5], PDs are typically made from germanium or compound semiconductors (e.g., InGaAs), that offer better sensitivity and responsivity to light [6]. Such PDs can be designed and optimized independently or in an array to achieve a desired combination of high responsivity, low noise, and wide bandwidth. Alternatively, they may be integrated into a silicon photonic platform alongside other optical components. In either case, the PD is generally not monolithically integrated with a DSP-based equalizing front-end, therefore implying packagelevel heterogeneous integration of the PD and front-end. With the eventual increase of the number of data lanes in the near future, the spacing between the discrete PD and their corresponding front-ends will inevitably increase, as shown in Fig. 1. Moreover, this leads to different interconnect lengths between the PD and the AFE. Consequently, more parasitics will be present at the optical receiver's input. Signal integrity impairments, such as reflections, will manifest at the interface between the PD and the AFE. To mitigate these impairments and the impact of the added parasitics on AFE performance, the package, and the AFE should be co-designed for optimal performance. Moreover, the optimal AFE design is different for various interconnect lengths. This necessitates developing an automated and fast optimization flow that takes interconnect design into account.

System-level high-speed data link modeling and optimization have been studied intensively in recent years. Prior work, such as [7], [8], and [9], primarily focused on modeling equalizers


Fig. 2. Interface packaging (top) and the corresponding models used for the optimization (bottom). Colors are used to delineate which model correspond to which component. Design parameters are annotated in red, which are: transmission line width TW, T-coil geometric parameters $L, W, S, N_{\text {in }}$, and $N_{\text {out }}$, TIA modeling parameters $R_{F}, C_{g s}, C_{g d}$, and $g_{m}$, FFE tap weights $w$, and DFE tap weights $b$.
without detailed consideration of the proceeding AFE. In particular, the authors in[8] and [9] do not take the noise into account. Manukovsky et al. [10], presented using machine learning (ML) techniques to model SerDes systems without providing much design insight. Yang et al. [11] presented the results of an IBIS-AMI holistic model, but without describing many key implementation details.

This article studies the modeling and optimization of the packaging interface, and the AFE of an optical communication receiver holistically. Particularly, we have the following contributions to make our work distinct from others. First, we discuss how each AFE block is modeled in detail and make them open-sourced ${ }^{1}$ so that readers can reuse the information provided in this work. We use foundry-provided models to accurately capture their impact on the design. Second, we take both jitter and noise into account so that their degradation on the channel performance can be investigated. Third, we have the T-coil included in our link model, and we apply some novel ML techniques to accelerate its modeling.

The rest of this article is organized as follows. Section II describes the modeling of the parts of the interface under consideration. Section III discusses the optimization procedure. Section IV presents the optimization results and discussion. Section V presents the experimental validations. Finally, Section VI concludes this article.

## II. Modeling the Interface

This article considers 4-PAM modulation with a baud rate of $64 \mathrm{Gbaud}(128 \mathrm{~Gb} / \mathrm{s})$. The interface is shown in Fig. 2, along with corresponding models, comprising a discrete PD connected to the optical receiver through some packaging interconnect. A

[^1]

Fig. 3. Stack of the organic substrate used for transmission line simulations.
low-cost organic substrate is assumed here. Electrostatic discharge (ESD) protection circuits are required to prevent damage during manufacturing, assembly, or use of the components. The ESD circuits introduce parasitic capacitances that could harm performance. To ameliorate this, a bridged T-coil circuit is introduced to extend bandwidth. Then a TIA is followed by additional variable-gain amplification (VGA) stages and an analog-to-digital (ADC) converter. In the following optimization, we assume the noise and impairments of the VGA and ADC are negligible compared with the noise and bandwidth limitations of the TIA. Finally, digital equalization is used to remove ISI at the output of the TIA. A model of the complete front-end is formed by combining analytical (two-port linear) models, electromagnetic (EM) simulations, and ML techniques for each element in the model as appropriate. Each component in Fig. 2 will be described in detail in the following sections.

## A. Modeling the PD and the Interconnect

The PD is modeled as an ideal current source in parallel with a junction capacitance, $C_{\mathrm{PD}}$, of 10.7 fF and a series resistance, $R_{\mathrm{PD}}$, of $87 \Omega$. These values are based on the GlobalFoundaries (GF) 45SPCLO CMOS (silicon photonics) process but are also consistent with standalone germanium PDs and other silicon
photonics [12]. The rise and fall times of the signal are assumed to be 6 ps , corresponding to around 0.4 unit interval (UI) at 64 Gbaud. We also assume that the PD generates a peak-to-peak current, $I_{\mathrm{pp}}$, of $100 \mu \mathrm{~A}$. The PD source impedance is analytically modeled using the following ABCD matrix:

$$
T_{\mathrm{PD}}=\left[\begin{array}{cc}
1 & R_{\mathrm{PD}}  \tag{1}\\
s C_{\mathrm{PD}} & s R_{\mathrm{PD}} C_{\mathrm{PD}}+1
\end{array}\right]
$$

where $s$ is the complex frequency. In terms of packaging, we assume flip-chip packaging of the PD die and the AFE die onto an organic substrate [13]. The solder bump introduces parasitics and discontinuity. It is modeled as $10 f F$ shunt capacitance, $C_{\text {bump }}$, and a 20 pH series inductance, $L_{\text {bump }}$ [14]. The ABCD matrix of the bump on the PD side, $T_{\text {bump }}$, is given by the following:

$$
T_{\text {bump }}=\left[\begin{array}{cc}
1+s^{2} L_{\mathrm{bump}} C_{\mathrm{bump}} & s L_{\mathrm{bump}}  \tag{2}\\
s C_{\mathrm{bump}} & 1
\end{array}\right] .
$$

As the distance between PDs and AFEs increases, the trace interconnecting the two should be designed as a transmission line to alleviate reflections and signal degradation. Thus, here we consider a transmission line connecting the PD to the AFE. We consider transmission line lengths, TL, of 250 and $500 \mu \mathrm{~m}$, typical values between the PD the AFE. We also consider a hypothetical transmission line length of 5 mm . Such a long-length transmission line may be needed to support future high density interconnects where a large number of PDs are arrayed around and connected to the receiver IC. For a given length TL, the width of the transmission line TW is a design parameter, and it is assumed to be bounded between 15 and $100 \mu \mathrm{~m}$ with $5 \mu \mathrm{~m}$ steps. A lower limit for the trace width is typically set to ensure the minimum trace width allowed by the organic substrate is manufacturable, while an upper limit is assumed to permit high interconnect density. The width of the transmission line controls the characteristic impedance. The microstrip transmission line was simulated using Ansys HFSS over the design space to obtain its ABCD matrix, $T_{\mathrm{TL}}$, as a function of frequency. We note that the EM simulations consider losses in the transmission line. The organic substrate stack shown in Fig. 3 was used in these EM simulations. The model assumes an epoxy-based substrate dielectric material developed for high-speed and low-dielectric loss applications [15]. It has a relative dielectric constant of 3.3, dielectric loss at 5.8 GHz of 0.0044 , and a surface roughness of 200 nm . With a $15 \mu \mathrm{~m}$ transmission line, it results in 0.1 dB loss for $250 \mu \mathrm{~m}, 0.3 \mathrm{~dB}$ for $500 \mu \mathrm{~m}$, and 1.1 dB loss for 5 mm at the Nyquist frequency of 32 GHz .

Similar to the solder bump on the PD side, there is a solder bump connecting the transmission line to the receiver IC, $T_{\text {bump,rx }}$. The ABCD matrix of this bump is given by the following:

$$
T_{\text {bump }, \mathrm{rx}}=\left[\begin{array}{cc}
1 & s L_{\mathrm{bump}}  \tag{3}\\
s C_{\mathrm{bump}} & 1+s L_{\mathrm{bump}} C_{\mathrm{bump}}
\end{array}\right]
$$

The pad on the receiver side introduces a relatively large capacitance that creates a discontinuity at the interface and introduces a pole at the input of the AFE, limiting bandwidth.


Fig. 4. (a) T-coil-enhanced ESD circuit. (b) T-coil layout, where in this case $N_{\text {in }}=4$ and $N_{\text {out }}=5[18]$.

Here, we assume a fixed pad size regardless of the transmission line width. This assumption is made considering that having a large pad for bonding is necessary. A typical capacitance of $C_{\mathrm{PAD}}=100 \mathrm{fF}$ is modeled with the following ABCD matrix:

$$
T_{\mathrm{PAD}}=\left[\begin{array}{cc}
1 & 0  \tag{4}\\
s C_{\mathrm{PAD}} & 1
\end{array}\right]
$$

## B. ML Model for T-Coil S-Parameters Predictions

A bridged T-coil is often incorporated at the input of the AFE to offset the impact of the ESD capacitance $C_{\text {esd }}$, which is assumed to be 80 fF at the receiver's input, as shown in Fig. 2. The capacitor $C_{\text {esd }}$ is necessary to protect the circuit from ESD. However, $C_{\text {esd }}$ introduces a low-frequency pole, which decreases the front-end bandwidth. The T-coil ameliorates the impact of $C_{\text {esd }}$. Intuitively, the T-coil essentially introduces inductance on either side of $C_{\text {esd }}$, creating an artificial LC transmission line that increases the front-end bandwidth while introducing a small delay [16]. The T-coil can be modeled as two mutually coupled inductors with a bridge capacitance [17]. Fig. 4(a) shows the T-coil-enhanced ESD circuit, and Fig. 4(b) shows the layout of a T-coil.

Modeling the T-coil while sweeping circuit element parameters [such as $R, C, k$, and $C_{\mathrm{br}}$ in Fig. 4(a)] may lead to an unrealistic T-coil design because the values will depend on the physical geometry of the T-coil. This makes it challenging to perform optimization by sweeping design variables. An alternative approach is using EM simulators to model the T-coil while sweeping the T-coil geometric parameters. However, this could be problematic as EM simulations can be time consuming, especially considering the large design space where many T-coil designs need to be considered. We leveraged the neural network (NN) proposed in [18] to promptly predict each T-coil's

TABLE I
Geometric Parameters of T-Coil in GF 22 nm FD-SOI

| Parameter | Unit | Min | Max |
| :--- | :--- | :--- | :--- |
| Outer Diameter $L$ | $\mu \mathrm{~m}$ | 32 | 80 |
| Metal Width $W$ | $\mu \mathrm{~m}$ | 2.4 | 5 |
| Metal Spacing $S$ | $\mu \mathrm{~m}$ | 1.2 | 1.44 |
| Inner Segments $N_{\text {in }}$ | - | 5 | 25 |
| Outer Segments $N_{\text {out }}$ | - | 4 | 12 |



Fig. 5. (a) Structure of DeConv layer. (b) Structure of UpCNN, which consists of a pure upsampling layer and a convolutional layer. The upsampling algorithm used here is the nearest neighbor [18].

S-parameters over a wide frequency range to resolve these challenges. This is done by taking the T-coil's geometric parameters as inputs to a NN that quickly predicts S-parameters allowing for accelerated optimization iterations. The design geometric parameters, as shown in Fig. 4(b), are the T-coil length, $L$, width, $W$, metal spacing, $S$, inner number of turns, $N_{\text {in }}$, and outer number of turns, $N_{\text {out }}$.

To demonstrate the idea of the proposed $\mathrm{NN}^{2}$ and its feasibility, we used a GF 22 nm FD-SOI CMOS process as the targeted technology node here since its design kit has built-in T-coil layouts. However, designers can apply this NN to any other technology nodes. The T-coil geometric parameter inputs are given in Table I. The NN outputs the real and imaginary parts of the T-coil S-parameters as a function of frequency. Since the number inputs geometric parameters are significantly smaller than the number of output S-parameters, a series of upsampling layers are required in the NN. A single-input-multichannel deconvolutional layer [DeConv, shown in Fig. 5(a)] and upsampling convolutional NN [UpCNN, shown in Fig. 5(b)] are employed to achieve this objective. The upsampling layer can use different upsampling algorithms, such as the nearest neighbor and linear interpolation. For simplicity, this work applies the former. Fig. 6 shows the entire structure of the UpCNN. The T-coil's geometric parameters are first mapped to some high-level abstract representation through a multilayer perceptron, which is then passed to the DeConv and a series of UpCNNs. The predicted T-coil S-parameters will be the final output of the NN. These $S$-parameters are then converted to the ABCD parameters, $T_{T \text { coil }}$, to represent the T-coil network.

[^2]

Fig. 6. Structure of the proposed NN for predicting T-coil S-parameters [18].


Fig. 7. S-parameter EVM mean with 584 test cases.

The NN is trained with S-parameters (dc to 256 GHz ) from 2920 T-coils. They are simulated with Cadence EMX using 32 cores of Intel Xeon Gold 6242R CPU. It takes about 30 h to prepare these training data. Training the NN on a NVidia RTX A4000 GPU required approximately 10 min . Note these were one-time efforts for this technology. We used $K$-fold crossvalidation to evaluate the NN performance. The loss function used to train and test the proposed model is a modified mean squared error, as in [19] the following:

$$
\begin{equation*}
L_{\mathrm{freq}}=\frac{1}{N} \sum_{n=1}^{N} \sqrt{\frac{1}{K} \sum_{k=1}^{K}\left(S_{n, k}-\hat{S}_{n, k}\right)^{2}} \tag{5}
\end{equation*}
$$

where $N$ is the number of elements in the training set, $K$ is the number of frequency points, $S_{n, k}$ is the true S-parameters (obtained by EM simulation) at frequency point $k$ for the $n$th T-coil, and $\hat{S}_{n, k}$ is the corresponding prediction. This loss function trains the model to minimize the error across all frequencies.

One way to evaluate the accuracy is to evaluate the error vector magnitude between the NN output and the true EM-simulated S-parameters given by the following:

$$
\begin{equation*}
\mathrm{EVM}_{n, k}=\sqrt{\left(S_{n, k}-\hat{S}_{n, k}\right)^{2}} \tag{6}
\end{equation*}
$$

where $\mathrm{EVM}_{n, k}$ is the error vector magnitude of the S-parameters at the frequency point $k$ for the $n$th T-coil. Fig. 7 shows the mean S-parameters EVM of the NN output over 584 test cases. It can be seen that the error increases with frequency. However, given


Fig. 8. T-coil pulse response for $\mathrm{TL}=250 \mu \mathrm{~m}$ with $\left[L, W, N_{\text {in }}, N_{\text {out }}\right]=$ [43, 4.2, 11, 6].


Fig. 9. T-coil pulse response for $\mathrm{TL}=500 \mu \mathrm{~m}$ with $\left[L, W, N_{\text {in }}, N_{\text {out }}\right]=$ [44, 4.2, 11, 5].
the Nyquist rate of 4-PAM here is about 32 GHz . According to Fig. 7, for 584 test T-coils, their mean EVM is at most $0.01-0.02$ below the Nyquist frequency, which is about -30 to -40 dB error.

We have also investigated the performance of our proposed NN in the time domain by examining its derived pulse response. We terminate the middle tap of the T-coil with the $C_{\text {esd }}=80 \mathrm{fF}$ as well as both input and output with a $50 \Omega$ resistor. We convolve its impulse response with a current pulse $I_{\mathrm{PP}}$ to generate the pulse response, which is then compared with the one generated from the EMX simulation. For example, the optimal T-coils for $\mathrm{TL}=250$ and $500 \mu \mathrm{~m}$ have been examined, with their geometric parameters given in Table III in Section IV. Noted that these T-coils do not necessarily match to $50 \Omega$. Figs. 8 and 9 show the pulse response comparison for these two T-coils. Our model's predictions toward the main cursor are reasonably accurate but over-optimistic on the post-cursor reflections, possibly due to larger EVM in the high-frequency domain, as shown in Fig. 7. The output of the NN is the S-parameter, and the cost function during the training process is only evaluating the accuracy of the predicted S-parameters, not the time-domain pulse responses [20]. This is acceptable here since the motivation of our ML model is to replace the EM simulation by promptly predicting the S-parameters of a given T-coil so that the design space can be quickly narrowed down [18].

TABLE II
TIA Parameters Used for Simulations

| Parameter | Description | Value |
| :---: | :---: | :---: |
| $f_{\text {baud }}$ | Baud rate | 64 Gbaud |
| $f_{t}$ | Technology transient frequency | $5 \times f_{\text {baud }}$ |
| $g_{m}$ | TIA combined transconductance | $10-120 \mathrm{mS}$ |
| $R_{F}$ | TIA Feedback resistor | $100-4000 \Omega$ |
| $C_{g}$ | TIA Gate capacitance | $C_{g}=g_{m} /\left(2 \pi f_{t}\right)$ |
| ${ }^{1} C_{g s}$ | TIA Gate-to-source capacitance | $2 / 3 C_{g}$ |
| ${ }^{1} C_{g d}$ | TIA Gate-to-drain capacitance | $1 / 3 C_{g}$ |
| $C_{a}$ | TIA Output capacitance | 25 fF |
| $A$ | TIA Inverter gain | $6 \mathrm{~V} / \mathrm{V}$ |
| $R_{a}$ | TIA output resistance | $A / g_{m}$ |

${ }^{1}$ Assumed based on simulations reported in [24].

## C. Modeling the TIA

The input of the AFE is a TIA that follows the T-coil and converts the input photocurrent into voltage. A commonly used TIA architecture is inverter-based shunt feedback, such as the $128 \mathrm{~Gb} / \mathrm{s}$ PAM-4 linear TIA in [21], the $64 \mathrm{~Gb} / \mathrm{s}$ PAM-4 TIA in [22], the $53 \mathrm{~Gb} / \mathrm{s}$ TIA in [23], and the $64 \mathrm{~Gb} / \mathrm{s}$ NRZ TIA in [24]. Thus, we use it here. Inverter-based TIAs consist of an inverter with a shunt-feedback resistor converting the input current to output voltage as shown in Fig. 2. Inverter-based TIAs are simple to implement using CMOS technology, allowing them to be integrated alongside DSP equalizers, and have been used in optical receivers at $100 \mathrm{~Gb} / \mathrm{s}$ and beyond (for e.g., [1]). The small-signal model is shown in Fig. 2. The design parameters of the TIA (which we assume to have been designed in 14 nm CMOS FinFET [24]) are transistors widths and the feedback resistance. In the small signal model, some parameters scale with transistor width. In this model, some parameters are coupled. Namely, the transconductance, $g_{m}$, the gate-to-source capacitance, $C_{\mathrm{gs}}$, the gate-to-drain capacitance $C_{\mathrm{gd}}$, and the equivalent output resistance of the TIA, $R_{a}$. The value of $g_{m}$ is related to the gate capacitance, $C_{g}$, by the cutoff frequency of the technology node. We assume that the transistors are in deep inversion and that the ratio of $C_{\mathrm{gd}} / C_{\mathrm{gs}}=2$ (i.e., $C_{\mathrm{gs}}=2 / 3 C_{g}$, and $C_{\mathrm{gd}}=1 / 3$ $C_{g}$ ), based on [24]. We make this assumption because $V_{t}$ (threshold voltage) values are usually significantly below $V_{\mathrm{gs}}=0.5 \mathrm{~V}$ in our inverter-based TIA feedback configuration. However, adjusting the $C_{\mathrm{gs}} / C_{\mathrm{gd}}$ ratio is important if $V_{t}$ approaches $V_{\mathrm{gs}}$. Table II gives the numerical values used in this study alongside the relationships between coupled parameters. We note that $C_{a}$ represents the combination of the output capacitance of the TIA and the input capacitance of the following stage. For the purposes of this study, we take $g_{m}$ to be the design variable proportional to transistor width, while other parameters scale with it according to Table II. Moreover, considering the limited input current swing we assumed, and considering one-stage inverter-based TIA, which is characterized by having good linearity, we ignore nonlinear nonidealities.

The parameters of the ABCD matrix of the TIA, $T_{\text {TIA }}$, are expressed by the following set of equations:

$$
\begin{equation*}
A_{\mathrm{TIA}}=\frac{R_{a}+R_{F}+s R_{a} R_{F}\left(C_{a}+C_{\mathrm{gd}}\right)}{R_{a}\left(s C_{\mathrm{gd}} R_{F}-R_{F} g_{m}+1\right)} \tag{7}
\end{equation*}
$$



Fig. 10. Optimization flow.

$$
\begin{align*}
B_{\mathrm{TIA}}= & \frac{-R_{F}}{g_{m} R_{F}-s C_{\mathrm{gd}} R_{F}-1}  \tag{8}\\
C_{\mathrm{TIA}}= & \frac{\left(s C_{\mathrm{gd}} R_{F}+1\right)\left(R_{a} g_{m}+s C_{a} R_{a}+1\right)}{R_{a}\left(s C_{\mathrm{gd}} R_{F}-R_{F} g_{m}+1\right)} \\
& +\frac{s C_{\mathrm{gs}}\left(R_{a}+R_{F}+s C_{a} R_{a} R_{F}+s C_{\mathrm{gd}} R_{a} R_{F}\right)}{R_{a}\left(s C_{\mathrm{gd}} R_{F}-R_{F} g_{m}+1\right)}  \tag{9}\\
D_{\mathrm{TIA}}= & \frac{s C_{\mathrm{gd}} R_{F}+s C_{\mathrm{gs}} R_{F}+1}{s C_{\mathrm{gd}} R_{F}-R_{F} g_{m}+1} . \tag{10}
\end{align*}
$$

The ABCD matrix of the series connection of all the elements from the PD to the output of TIA is given by the following:

$$
\begin{equation*}
T_{\text {link }}=T_{\mathrm{PD}} T_{\text {bump }} T_{\mathrm{TL}} T_{\text {bump }, \mathrm{rx}} T_{\mathrm{PAD}} T_{\mathrm{Tcoil}} T_{\mathrm{TIA}} \tag{11}
\end{equation*}
$$

From the $T_{\text {link }}$, we are interested in the transimpedance from the PD current to the voltage output of the TIA. This transfer function is given by $H(f)=1 / C_{\text {link }}$, where $C_{\text {link }}$ is the $C$ parameter of the $T_{\text {link }}$ matrix. The impulse response, $h$, is obtained by taking the inverse Fourier transform of $H$. Finally, the pulse response $h_{\text {pulse }}$, is obtained by convolving the impulse response with the input current pulse of 1 UI in duration with 6 ps riseand fall-time.

## D. Modeling Receiver Noise

The pulse response captures the time-domain behavior of the system, including reflections. However, it does not consider other signal impairments, such as noise or jitter. We describe how jitter is taken into account in Section III. The noise contributions
arise from the feedback resistance, $R_{F}, I_{n, R_{F}}^{2}$, and the MOS channel thermal noise of the TIA transistors, $I_{n, g_{m}}^{2}$. The noise variances at the output of the TIA from each of these noise sources are given by the following expressions:

$$
\begin{align*}
v_{n, R_{F}}^{2} & =\int_{0}^{\infty} I_{n, R_{F}}^{2}\left|\frac{Z_{a} \times Z_{f} \times\left(1+g_{m} \times Z_{\text {in }}\right)}{Z_{f}+Z_{\text {in }}+\left(1+g_{m} \times Z_{\text {in }}\right) \times Z_{a}}\right|^{2} d f  \tag{12}\\
v_{n, g_{m}}^{2} & =\int_{0}^{\infty} I_{n, g_{m}}^{2}\left|\frac{Z_{a} \times\left(Z_{f}+Z_{\text {in }}\right)}{Z_{f}+Z_{\text {in }}+\left(1+g_{m} \times Z_{\text {in }}\right) \times Z_{a}}\right|^{2} d f . \tag{13}
\end{align*}
$$

In these expressions, $Z_{\text {in }}$ refers to the impedance looking into the T-coil including $C_{\mathrm{gs}}$ of the TIA ( $Z_{\text {in }}$ in Fig. 2), $Z_{f}$ is the parallel combination of the feedback resistor, $R_{F}$, and the gate-to-drain capacitance, $C_{\mathrm{gd}}$ ( $Z_{f}$ in Fig. 2), and $Z_{a}$ is the parallel combination of the equivalent output impedance of the TIA ( $Z_{a}$ in Fig. 2). Finally, $I_{n, R_{F}}^{2}=4 k T / R_{f}$ and $I_{n, g_{m}}^{2}=4 k T \gamma g_{m}$, where $T=300 K$ is the temperature and $\gamma=2$.

The total noise variance at the output of the TIA, $\sigma_{n, \text { TIA }}^{2}$, is the sum of (12) and (13).

## E. Feed-Forward Equalizer (FFE) and Decision-Feedback Equalizer (DFE)

A FFE and a DFE follow the AFE. In this model, the output of the TIA is connected directly to an equalizer. This simplification is done for the purpose of studying/investigating the effective or achievable signal-to-noise ratio (SNR). However, in a real system, a variable gain amplifier follows the TIA to achieve higher gain and condition the signal for being sampled by the ADC or DFE slicers. Moreover, in our model, we have assumed that there is an ideal ADC after the TIA and before the FFE. This allows for using a long-tap FFE. Here, we assume that the number of taps is given, but the tap weights are free parameters. While it is possible to include tap weights as part of the global optimization, for a typically large number of FFE tap weights (over 10) the optimization can become intractable. Instead, we choose to select the minimum-mean-square-error (MMSE) FFE and DFE tap weights based on the pulse response and noise under consideration.

Specifically, the MMSE FFE tap weights, $\Phi$, are as follows [25]:

$$
\begin{equation*}
\Phi=Y_{\mathrm{des}, \Delta}^{T} H^{T}\left(\mathrm{HPH}^{T}-\sigma_{n, \mathrm{TIA}}^{2} I\right)^{-1} \tag{14}
\end{equation*}
$$

where $Y_{\text {des, } \Delta}$ is the desired pulse response having $\Delta$ UI delay, $H$ is the channel pulse response matrix, $P$ is the diagonal matrix whose diagonal is ones except for the $K$ (number of DFE taps) entries after the main cursor that are set to zero, and $I$ is the identity matrix.

The MMSE DFE taps are then calculated as follows:

$$
\begin{equation*}
B=\Phi H J \tag{15}
\end{equation*}
$$

where $J$ is a vector of zeros with the same length as $Y_{\text {des }, \Delta}$ except for the $K$ entries after the main cursor, which are equal to one.

The delay, $\Delta$, controls the number of FFE precursor taps and should be selected for optimal performance. To find the optimal number of FFE taps, we sweep the $\Delta$ to maximize unbiased

TABLE III
Optimization Results Assuming 32 FFE Taps and 4 DFE Taps

| $\mathrm{TL}(\mu \mathrm{m})$ | $\mathrm{TW}(\mu \mathrm{m})$ | $\mathrm{L}(\mu \mathrm{m})$ | $\mathrm{W}(\mu \mathrm{m})$ | $N_{\text {IN }}$ | $N_{\text {OUT }}$ | $g_{m}(\mathrm{mS})$ | $R_{F}(\Omega)$ | $\sigma_{\text {ISI }}\left(m V_{\text {rms }}\right)$ | $\sigma_{n, \text { output }}\left(m V_{\text {rms }}\right)$ | $\sigma_{j i t t e r}\left(m V_{\text {rms }}\right)$ | $\mathrm{FoM}(\mathrm{dB})$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 250 | 15 | 43 | 4.2 | 11 | 6 | 120 | 2514 | 0.257 | 0.198 |  |  |
| 500 | 15 | 44 | 4.2 | 11 | 5 | 120 | 2508 | 0.236 | 0.738 | 0.741 | 0.125 |
| 5000 | 15 | 43 | 4.2 | 11 | 6 | 120 | 1143 | 0.226 | 0.741 | 0.37 |  |

TABLE IV
Optimization Results Assuming 6 FFE Taps and 2 DFE Taps

| $\mathrm{TL}(\mu \mathrm{m})$ | $\mathrm{TW}(\mu \mathrm{m})$ | $\mathrm{L}(\mu \mathrm{m})$ | $\mathrm{W}(\mu \mathrm{m})$ | $N_{\text {IN }}$ | $N_{\text {OUT }}$ | $g_{m}(\mathrm{mS})$ | $R_{F}(\Omega)$ | $\sigma_{\text {ISI }}\left(m V_{\text {rms }}\right)$ | $\sigma_{n, \text { output }}\left(m V_{\text {rms }}\right)$ | $\sigma_{j \text { itter }}\left(m V_{\text {rms }}\right)$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 250 | 15 | 42 | 4.2 | 12 | 7 | 114 | 1762 | 0.297 | 0.750 | 0.2 |
| 500 | 15 | 34 | 2.4 | 10 | 5 | 116 | 2215 | 0.317 | 0.762 | 0.128 |
| 5000 | 80 | 37 | 5 | 18 | 6 | 108 | 609 | 0.382 | 0.89 |  |

TABLE V
Optimization Results Assuming No EQUalization

| $\mathrm{TL}(\mu \mathrm{m})$ | $\mathrm{TW}(\mu \mathrm{m})$ | $\mathrm{L}(\mu \mathrm{m})$ | $\mathrm{W}(\mu \mathrm{m})$ | $N_{\text {IN }}$ | $N_{\text {OUT }}$ | $g_{m}(\mathrm{mS})$ | $R_{F}(\Omega)$ | $\sigma_{\text {ISI }}\left(m V_{\text {rms }}\right)$ | $\sigma_{n, \text { output }}\left(m V_{\text {rms }}\right)$ | $\sigma_{j i t t e r}\left(m V_{\text {rms }}\right)$ | $\mathrm{FoM}(\mathrm{dB})$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 250 | 70 | 43 | 4.2 | 10 | 7 | 112 | 176 | 0.219 | 0.135 |  |  |
| 500 | 15 | 49 | 5 | 11 | 5 | 104 | 191 | 0.215 | 0.604 | 0.63 | 0.089 |
| 5000 | 100 | 32 | 4.2 | 6 | 11 | 80 | 142 | 0.649 | 0.09 |  |  |

MMSE SNR given by the following [25]:

$$
\begin{equation*}
\mathrm{SNR}_{\mathrm{UMMSE}}=\frac{1}{1-\Phi H \mathrm{Y}_{\mathrm{des}, \Delta}}-1 \tag{16}
\end{equation*}
$$

## III. Optimization Procedure

To optimize the interface, a signal integrity criteria that reflects the quality of the signal at the receiver's output and takes impairments, such as reflections, jitter, and noise, into account has to be defined. Therefore, we opt to define a signal integrity figure of merit (FoM) that can be calculated statistically and correlates with bit error rate (BER). Statistical analysis of high-speed serial links provides an efficient way to evaluate performance since it relies on the pulse response of the channel, rather than relying on a large amount of randomly generated bit patterns. A statistically calculated FoM with the proposed optimization approach allowing it to be calculated rapidly, allowing for faster convergence on optimal design.

Specifically, the FoM is defined as follows:

$$
\begin{equation*}
\mathrm{FoM}=10 \log _{10} \frac{A_{\text {signal }}^{2}}{\sigma_{\mathrm{ISI}}^{2}+\sigma_{n, \text { output }}^{2}+\sigma_{\mathrm{jitter}}^{2}} \tag{17}
\end{equation*}
$$

where $\sigma_{\text {ISI }}$ represents the residual ISI at the output of the equalizer, $\sigma_{n, \text { output }}$ is the rms voltage noise at the output of the equalizer, and $\sigma_{\mathrm{jitter}}$ represents the rms jitter-to-amplitude voltage conversion. The term $A_{\text {signal }}$ is calculated from the pulse response as follows: assuming the equalized pulse response, $h_{\text {pulse,eq }}$, is baud rate sampled with $O$ samples, and that the index of the main (max) cursor is zero, then $A_{\text {signal }}=1 / 3 \times h_{\text {pulse }}(0)$. The peak of the pulse response, $A_{\text {signal }}$, represents the peak-topeak amplitude of the modulated and equalized signal. This FoM is a form of SNR, and a higher FoM corresponds to a better BER.

The residual ISI power, $\sigma_{\text {ISI }}^{2}$, is also calculated from the pulse response as follows:

$$
\begin{equation*}
\sigma_{\mathrm{ISI}}^{2}=\frac{5}{9} \sum_{n \neq 0}^{O} h_{\text {pulse,eq }}^{2}(n) \tag{18}
\end{equation*}
$$

where the factor $5 / 9$ takes into account the differing ISI contributed by different 4-PAM symbol amplitudes [26].

Noise variance at the output of the equalizer is calculated using the calculated $M$ FFE tap weights, $w_{0}, w_{1}, \ldots, w_{M}$, and the autocorrelation of the noise at the output of the TIA, $R$ as follows:

$$
\begin{equation*}
\sigma_{n, \text { output }}^{2}=\sum_{i=1}^{M} \sum_{j=1}^{M} w_{i} w_{k} R\left(\frac{|i-k|}{f_{\mathrm{baud}}}\right) \tag{19}
\end{equation*}
$$

where the noise autocorrelation is calculated by taking the inverse Fourier transform of the noise power spectral density obtained by adding the operands of (12) and (13) [25].

Jitter causes eye height to fluctuate around the sampling point. Thus, jitter translates into amplitude noise, reducing signal integrity. In other words, when the signal is jittery, the location of the peak of the signal changes with respect to sampling time. This means that, if the sampling phase is fixed, the signal would be sampled at OFF-peak when there is jitter. This leads to eye height degradation. The jitter-to-amplitude conversion variance, $\sigma_{\text {jitter }}^{2}$, is [26] as follows:

$$
\begin{equation*}
\sigma_{\mathrm{jitter}}^{2}=\frac{5}{9} \sigma_{j}^{2} \sum_{n}^{O} \mu^{2}(n) \tag{20}
\end{equation*}
$$

where $\sigma_{j}$ is the rms jitter, and $\mu$ is the slope of equalized pulse response at the sampling points. The value of $\sigma_{j}$ is assumed to be 0.015 UI. In this equation, the factor $5 / 9$ accounts for the density of 4-PAM transitions. The amount of eye height variation equals the amount of time variation times the slope. The quantities are squared to make them power quantities.

For this optimization, we use the genetic algorithm (GA) with the flow shown in Fig. 10. In this flow, an initial population of 200 sets of design parameters $\left(T W, W, L, N_{\text {in }}, N_{\text {out }}, g_{m}\right.$, and $R_{F}$ ) are randomly generated. Pulse responses ( $h_{\text {pulse }}$ ) and noise variances at the output of the TIA $\left(\sigma_{n, \text { TIA }}^{2}\right)$ are calculated for each set of design parameters set as described in Section II. Using this information, the MMSE tap weights are calculated.


Fig. 11. Eye diagrams, at the output of the equalizer, obtained using optimal design values, assuming 32 FFE taps and 4 DFE taps, at (a) $\mathrm{TL}=250 \mu \mathrm{~m}$. (b) $\mathrm{TL}=500 \mu \mathrm{~m}$. (c) $\mathrm{TL}=5 \mathrm{~mm}$. The contour shown corresponds to a BER $=2.4 \times 10^{-4}$.

The equalized pulse responses can then be calculated using the tap weights. Noise variances are also referred to the output of the equalizer through (19). The FoM for each design parameter set is calculated. A new generation is created through selection, crossover, and mutation of the best current design parameter sets.


Fig. 12. FoM versus the number of FFE taps.


Fig. 13. Optimal TW versus the number of FFE taps for 5 mm .

The process repeats until no improvement in FoM for a hundred generations. We also used the same seed for optimization to ensure repeatable results. Finally, the parameter values set corresponding to the best achieved FoM is selected as the optimal design. In this optimization scheme, the power consumption can be controlled by limiting the range of $g_{m}$ and the number of equalizer taps. To ensure the practical utility of the optimizer, it is necessary to model each component in the link accurately. Foundry-provided models are used for the PD and to train the T-coil modeling agent. Genetic optimization trials, including random mutations, continue until 100 consecutive generations produce no improvement in FoM. The overall optimization process takes around 29 min on a computer with the following specification: Intel Core i7-8750H @ 2.20 GHz CPU, 2666 MHz 16 GB SDRAM, and a 256 GB PCIe SSD.

## IV. Optimization Results and Discussion

The design was optimized for three transmission line lengths: $250 \mu \mathrm{~m}, 500 \mu \mathrm{~m}$, and 5 mm . Table III gives optimal design values, assuming 32 FFE taps and 4 DFE taps, and the corresponding FoM. Table IV gives the results assuming 6 FFE taps and 2 DFE taps, while Table V gives results with no equalization. We note that while the optimal transistor sizes in Table III were the largest permitted in our analysis for all three cases considered here (corresponding to largest $g_{m}$ ), this was not the case in trails where there were fewer taps of equalization (Tables IV and V). The likely reason why $g_{m}$ is optimal is that when $g_{m}$ value is


Fig. 14. Pulse responses for $\mathrm{TL}=5 \mathrm{~mm}$. (a) Pulse response obtained with optimal design values assuming no equalization. (b) Pulse responses at the output of the TIA, obtained with optimal design values assuming there is a 16 -tap FFE. (c) Pulse responses at the output of the FFE equalizer, obtained with optimal design values assuming there is a 16 -tap FFE.
high, the value of $R_{a}$ is low, resulting in a high output frequency pole and allowing for a high $R_{f}$, which results in a higher gain. Although large $g_{m}$ results in a larger $C_{g}$ lowering the input pole, the T-coil offsets this negative impact. Thus, a large $g_{m}$ is more favorable overall. Fig. 11 shows the eye diagrams obtained with this optimization (see Table III), including impairments. As these figures show, there is a good eye opening in all three cases. A contour corresponding to a $\mathrm{BER}=2.4 \times 10^{-4}$ is also shown on the eye diagram. This validates that the proposed optimization approach converges on designs with good eye opening and low BER.

Fig. 12 shows a plot of the number of FFE taps versus the value of FoM for all three lengths of transmission lines. As can be seen, FoM increases steadily with the number of FFE taps. Moreover, we notice that the first few taps result in a significant improvement in FoM with diminishing returns as the number of FFE taps increases beyond about six. This is particularly true for the long 5 mm transmission line, which benefits significantly from a few equalization taps. With sufficient equalization, the FoM for the 5 mm transmission line is on par with the 250 and $500 \mu \mathrm{~m}$ transmission lines.

In addition to optimizing designs, the proposed approach can be used to gain insight into optimal design guidelines. Here, we explore the relationship between the optimal transmission line width, TW, which controls the characteristic impedance of the transmission line, and the amount of available equalization in the case of the long 5 mm transmission line that can exhibit strong reflections.

We use the optimization platform to obtain optimal design values for various numbers of FFE taps with no DFE taps. Fig. 13 shows the optimal transmission line width versus the number of FFE taps. We see that with little or no equalization, the optimal transmission line width is wide and tends to narrow with the increasing number of FFE taps. To explain this behavior, we look at $h_{\text {pulse }}$, shown in Fig. 14, in two cases: with no equalization and 16 FFE taps. In both cases, we use optimal design values for each. With no equalization, the pulse response shown in Fig. 14(a) is obtained. This pulse response shows little to no reflections. In this case, a wide transmission line is preferred to avoid reflections that manifest as ISI. In other words, the optimizer chooses to achieve impedance matching between the transmission line and the input of the TIA to avoid reflections. To prove this, we inspected the input impedance of the TIA and compared it with the transmission line's characteristic impedance. The value of
the transmission line's characteristic impedance is around $37 \Omega$, while the value of the input impedance is $32 \Omega$, confirming the close matching.

On the other hand, a narrow transmission line is preferred in the case of 16 FFE taps. To explain this, we look at the pulse response at the output of the TIA shown in Fig. 14(b). Here we see a lot of reflections due to a large impedance mismatch between the characteristic impedance of the transmission line and the input of the TIA. However, when inspecting the pulse response at the output of the FFE [see Fig. 14(c)], we see that reflections are significantly reduced, particularly at the sampling points. This makes it unnecessary to do the impedance matching since the FFE is taking care of the reflections. The optimizer chooses a narrow transmission line, likely to reduce its introduced capacitance at the input of the chip.

Based on the preceding analysis, narrow transmission lines are preferred with sufficient equalization, along with a lower bandwidth, lower noise, and higher gain front-end. Such a design affords a lower power consumption in the AFE, but higher power in the DSP equalizer. With less equalization, a wider transmission line is preferable to ensure smaller reflections and better signal integrity. Note that the pitch of neighboring receiver lanes is typically limited by practical considerations, such as the pitch of mating fiber arrays, typically 100 's of $\mu \mathrm{m}$, and is unaffected by trace width optimizations. Of course, the optimization could be constrained to accommodate especially narrow channel pitches, if and as required. Therefore, we conclude the following design guideline: sufficient equalization to cancel reflections results in a narrow transmission line for the optimal design; otherwise, impedance matching is needed requiring a wider transmission line. These simulations highlight the importance of equalization in counteracting reflections.

## V. Experimental Validation

This section presents measurement results that illustrate the trends and tradeoffs elucidated by the automated optimization approach. Measurements were performed on a TIA prototype fabricated in 16 nm FinFET CMOS and flip-chip copackaged along with commercial PDs. Two copackaging arrangements were optimized for the same TIA, as shown in Fig. 15 with $\mathrm{TL}=250 \mu \mathrm{~m}$ and $\mathrm{TL}=500 \mu \mathrm{~m}$. Details of the complete frontend design are presented in [13] and [27].


Fig. 15. Copackaged prototype with TIA in 16-nm FinFET CMOS copackaged with arrayed PD with $\mathrm{TL}=250 \mu \mathrm{~m}$ and $\mathrm{TL}=500 \mu \mathrm{~m}$.


Fig. 16. Measured eye diagrams at $100 \mathrm{~Gb} / \mathrm{s}$ 4-PAM with (a) $\mathrm{TL}=$ $250 \mu \mathrm{~m}$, TW $=22 \mu \mathrm{~m}$, and $Z_{0}=75 \Omega$. (b) $\mathrm{TL}=500 \mu \mathrm{~m}$, TW $=60 \mu \mathrm{~m}$, and $Z_{0}=50 \Omega$.


Fig. 17. Measured vertical eye opening at $140 \mathrm{~Gb} / \mathrm{s}$ 4-PAM with $\mathrm{TL}=250 \mu \mathrm{~m}$ (a) FFE only. (b) FFE + DFE.

Although the prototype TIA's design parameters differ somewhat from the simulation model shown in Fig. 2, the same trends and tradeoffs are evident in the measured results. As predicted by the ML-assisted genetic optimizer in this work, the optimized interconnect is wider with $\mathrm{TW}=60 \mu \mathrm{~m}$ and a characteristic impedance $Z_{0}=50 \Omega$ for the longer trace, and narrower with $\mathrm{TW}=22 \mu \mathrm{~m}$, and a higher characteristic impedance $Z_{0}=75 \Omega$ for the shorter trace. This allows both copackaging arrangements
to maintain comparable 4-PAM signal integrity up to $100 \mathrm{~Gb} / \mathrm{s}$, as shown by the unequalized TIA output eye diagrams in Fig. 16.

Furthermore, as in the ML-assisted genetic optimization, we see a dramatic improvement in signal integrity (quantified by the vertical eye opening after equalization measured on the oscilloscope) once the span of the equalizers is sufficient to compensate for reflections and ringing induced in the package. Results incorporating FFE and DFE equalizers and varying the number of taps are shown in Fig. 17 at $140 \mathrm{~Gb} / \mathrm{s}$. An 8-tap FFE with one precursor tap equalizes the combination of packageinduced ISI and TIA bandwidth limitations, with additional taps providing little benefit. The inclusion of a 2-tap DFE provides a noticeable improvement, with little benefit from increasing the DFE length to 10 taps. The TIA has 32 GHz bandwidth, $45 \%$ of the baud rate in these experiments, comparable to the ML-assisted genetic optimization results.

## VI. CONCLUSION

The article presented the optimization of the interface between the PD and the AFE in high-speed, high-density optical receivers. We used the proposed framework to optimize transmission line width, the geometry of the T-coil, the inverter-based TIA, and FFE and DFE tap weights. We have applied a hybrid modeling methodology, consisting of analytical models, an EM simulation, and a NN model, to describe the interface and effectively optimize parameters. The framework is also used to draw insight into optimal design practices. For example, we have shown trends highlighting the relationship between the amount of equalization and the width of the transmission line. We showed that narrow transmission lines are favored when there is enough equalization. However, it should be noted that this could lead to high power consumption because of the increased number of taps required to counteract reflections. Therefore, a wider transmission line may be favored in power-efficient designs with limited equalization. These trends are further validated with measurements performed on a fabricated and assembled TIA prototype with various PD-to-TIA interface lengths at $100 \mathrm{~Gb} / \mathrm{s}$.

## AcknowLedgment

The authors would like to thank Dr. Hossein Shakiba from Huawei Technologies for his valuable discussions throughout this project.

## REFERENCES

[1] H. Li, C.-M. Hsu, J. Sharma, J. Jaussi, and G. Balamurugan, "A 100-Gb/s PAM-4 optical receiver with 2-Tap FFE and 2-Tap direct-feedback DFE in 28-nm CMOS," IEEE J. Solid-State Circuits, vol. 57, no. 1, pp. 44-53, Jan. 2022.
[2] K. R. Lakshmikumar et al., "A process and temperature insensitive CMOS linear TIA for $100 \mathrm{gb} / \mathrm{s} / \lambda$ PAM-4 optical links," IEEE J. Solid-State Circuits, vol. 54, no. 11, pp. 3180-3190, Nov. 2019.
[3] H. Li, G. Balamurugan, J. Jaussi, and B. Casper, "A $112 \mathrm{gb} / \mathrm{s}$ PAM4 linear TIA with $0.96 \mathrm{pJ} /$ bit energy efficiency in 28 nm CMOS," in Proc. IEEE 44th Eur. Solid State Circuits Conf., 2018, pp. 238-241.
[4] D. Patel, A. Sharif-Bakhtiar, and A. C. Carusone, "A $112 \mathrm{gb} / \mathrm{s}-8.2 \mathrm{dBm}$ sensitivity 4-PAM linear TIA in 16 nm CMOS with co-packaged photodiodes," in Proc. IEEE Custom Integr. Circuits Conf., 2022, pp. 1-2.
[5] F.-P. Chou, G.-Y. Chen, C.-W. Wang, Y.-C. Liu, W.-K. Huang, and Y.-M. Hsin, "Silicon photodiodes in standard CMOS technology," IEEE J. Sel. Topics Quantum Electron., vol. 17, no. 3, pp. 730-740, Mar. 2011.
[6] Z. Sheng, L. Liu, J. Brouckaert, S. He, and D. V. Thourhout, "InGaAs PIN photodetectors integrated on silicon-on-insulator waveguides," Opt. Exp., vol. 18, pp. 1756-1761, Jan. 2010.
[7] S. Song and Y. Sui, "System level optimization for high-speed SerDes: Background and the road towards machine learning assisted design frameworks," in Electronics, vol. 8, no. 11, 2019, Art. no. 1233.
[8] S. Katare, "Novel framework for modelling high speed interface using python for architecture evaluation," in Proc. IEEE Region 10 Conf., 2020, pp. 556-560.
[9] C. Xu et al., "A low BER adaptive sequence detection method for highspeed NRZ data transmission," in Proc. IEEE 7th Int. Conf. Integr. Circuits Microsystems, 2022, pp. 692-697.
[10] A. Manukovsky, Z. Khasidashvili, A. J. Norman, Y. Juniman, and R. Bloch, "Machine learning applications for simulation and modeling of 56 and 112 Gb SerDes systems," DesignCon, Santa Clara, CA, 2019.
[11] D. Yang, Y. Gan, V. Telang, M. Valliappan, and F. S. Tang, "Improving BIS-AMI model accuracy: Model-to-model and model-to-lab correlation case studies," DesignCon, Santa Clara, CA, 2014.
[12] A. Novack et al., "Germanium photodetector with 60 ghz bandwidth using inductive gain peaking," Opt. Exp., vol. 21, pp. 28387-28393, Nov. 2013.
[13] D. Patel, A. Sharif-Bakhtiar, and T. C. Carusone, "A $112 \mathrm{gb} / \mathrm{s}-8.2 \mathrm{dBm}$ sensitivity 4-PAM linear TIA in 16 nm CMOS with co-packaged photodiodes," IEEE J. Solid-State Circuits, vol. 58, no. 3, pp. 771-784, Mar. 2023, doi: 10.1109/JSSC.2022.3218558.
[14] B. Dehlaghi, N. Wary, and T. C. Carusone, "Ultra-short-Reach interconnects for die-to-die links: Global bandwidth demands in microcosm," IEEE Solid-State Circuits Mag., vol. 11, no. 2, pp. 42-53, Feb. 2019.
[15] S. Lee et al., "Development of FCBGA substrate with low Dk/Df material based on automotive reliability conditions," in Proc. IEEE 21st Electron. Packag. Technol. Conf., 2019, pp. 271-275.
[16] J. Kim, J.-K. Kim, B.-J. Lee, and D.-K. Jeong, "Design optimization of onchip inductive peaking structures for $0.13-\mu m$ CMOS $40-\mathrm{Gb} / \mathrm{s}$ transmitter circuits," IEEE Trans. Circuits Syst. I: Regular Papers, vol. 56, no. 12, pp. 2544-2555, Dec. 2009.
[17] B. Razavi, "The bridged T-coil [a circuit for all seasons]," IEEE Solid-State Circuits Mag., vol. 7, no. 4, pp. 9-13, Apr. 2015.
[18] Z. Li and A. C. Carusone, "Design and optimization of T-coil-Enhanced ESD circuit with upsampling convolutional neural network," in Proc. IEEE/MTT-S Int. Microw. Symp., 2022, pp. 495-497.
[19] H. M. Torun et al., "A spectral convolutional net for co-optimization of integrated voltage regulators and embedded inductors," in Proc. IEEE/ACM Int. Conf. Comput.-Aided Des., 2019, pp. 1-8.
[20] M. Swaminathan, H. M. Torun, H. Yu, J. A. Hejase, and W. D. Becker, "Demystifying machine learning for signal and power integrity problems in packaging," IEEE Trans. Compon. Packag. Manuf. Technol., vol. 10, no. 8, pp. 1276-1295, Aug. 2020.
[21] S. Daneshgar, H. Li, T. Kim, and G. Balamurugan, "A 128 gb/s PAM4 linear TIA with $12.6 \mathrm{pA} / \sqrt{\mathrm{Hz}}$ noise density in 22 nm FinFET CMOS," in Proc. IEEE Radio Freq. Integr. Circuits Symp., 2021, pp. 135-138.
[22] K.-L. Fu and S.-I. Liu, "A 64-Gb/s PAM-4 optical receiver with amplitude/phase correction and threshold voltage/data level calibration," IEEE Trans. Very Large Scale Integrat. Syst., vol. 28, no. 7, pp. 1726-1735, Jul. 2020.
[23] L. Szilagyi, J. Pliva, R. Henker, D. Schoeniger, J. P. Turkiewicz, and F. Ellinger, "A 53-Gbit/s optical receiver frontend with $0.65 \mathrm{pJ} / \mathrm{bit}$ in $28-\mathrm{nm}$ Bulk-CMOS," IEEE J. Solid-State Circuits, vol. 54, no. 3, pp. 845-855, Mar. 2019.
[24] I. Ozkaya et al., "A 64-Gb/s 1.4-pJ/b NRZ optical receiver data-path in 14-nm CMOS FinFET," IEEE J. Solid-State Circuits, vol. 52, no. 12, pp. 3458-3473, Dec. 2017.
[25] N. Al-Dhahir and J. M. Cioffi, "MMSE decision-feedback equalizers: Finite-length results," IEEE Trans. Inf. Theory, vol. 41, no. 4, pp. 961-975, Apr. 1995.
[26] "Measuring channel operating margin," 2016, [Online]. Available: Accessed: Apr. 18, 2023. https://dl.cdn-anritsu.com/en-us/test-measurement/files/Technical-Notes/White-Paper/11410-00989A.pdf
[27] D. Patel, B. Radi, A. Sharif-Bakhtiar, and A. Chan Carusone, "Experimental study of the equalization requirements of a 2.5 D co-packaged $16-\mathrm{nm}$ cmos optical receiver up to $160 \mathrm{gb} / \mathrm{s}$," in Proc. Eur. Conf. Opt. Commun., 2022, pp. 1-4.


Bahaa Radi (Member, IEEE) received the B.S. degree in electrical engineering from The Hashemite University, Zarqa, Jordan, in 2012, the M.S. degree in microsystems engineering from Khalifa University, Abu Dhabi, United Arab Emirates, in 2015, and the Ph.D. degree in electrical engineering from McGill University, Montreal, QC, Canada, in 2021.
He is currently a Postdoctoral Fellow with the Integrated Systems Laboratory, Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, Canada. He will soon be joining Alphawave Semi as a Senior Analog Design Engineer. His current research interests include the design and optimization of energy-efficient optical systems for communication and computing applications, and the co-design of electronic and photonic integrated circuits.


Zonghao Li (Member, IEEE) received the B.A.Sc. and M.Eng. degrees in electrical engineering from the University of British Columbia, Vancouver, BC, Canada and McGill University, Montreal, QC, Canada, in 2017 and 2019, respectively. He is currently working toward the Ph.D. degree in electrical and computer engineering with the Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, Canada.

His research focuses on applying machine learning techniques to analog integrated circuit design and system-level high-speed SerDes modeling.


Dhruv Patel (Member, IEEE) received the BASc. and MASc. degrees in electrical engineering from the University of Waterloo, Waterloo, ON, Canada, and the University of Toronto, Toronto, ON, in 2016 and 2020 , respectively. Since 2019 , he is working toward the Ph.D. degree with the University of Toronto with Integrated Systems Laboratory working towards Optical Communication Links in CMOS.
He was involved with variation tolerant subthreshold SRAM circuits research during undergraduate studies.
Mr. Patel was the recipient of the outstanding student paper award at the Custom Integrated Circuits Conference 2022, Ontario Graduate Scholarship, and the NSERC Scholarship for his doctoral studies.


Anthony Chan Carusone (Fellow, IEEE) received the Ph.D. degree in electrical and computer engineering from the University of Toronto, Toronto, ON, Canada, in 2002.

Since 2002 he has been a Professor with the Department of Electrical and Computer Engineering University of Toronto. Since 1997 he has been a Consultant to industry in the areas of integrated circuit design and digital communication. He is currently the Chief Technology Officer of Alphawave Semi in Toronto, Canada. He has coauthored Best Student Papers at the 2007, 2008, 2011, and 2022 Custom Integrated Circuits Conferences, Best Invited Paper at the 2010 Custom Integrated Circuits Conference, Best Paper at the 2005 Compound Semiconductor Integrated Circuits Symposium, Best Young Scientist Paper at the 2014 European Solid- State Circuits Conference, and the Best Paper at DesignCon 2021, and the popular textbooks "Analog Integrated Circuit Design" (along with D. Johns and K. Martin) and "Microelectronic Circuits" 8th edition (along with A. Sedra, K.C. Smith and V. Gaudet).

Professor Carusone was the Editor-in-Chief of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS in 2009, and an Associate Editor for the IEEE JOURNAL OF SOLID-STATE CIRCUITS 2010-2017. He was a Distinguished Lecturer for the IEEE Solid-State Circuits Society 2015-2017 and has served on the Technical Program Committee of several IEEE conferences including the International Solid-State Circuits Conference 2016-2021. He is currently the Editor-in-Chief of the IEEE Solid-State Circuits Letters.


[^0]:    Manuscript received 3 November 2022; revised 30 May 2023; accepted 14 August 2023. Date of publication 23 August 2023; date of current version 20 September 2023. This work was supported by Natural Sciences and Engineering Research Council of Canada. (Corresponding author: Bahaa Radi.)

    The authors are with the Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON M5S 3G4, Canada (e-mail: bahaa.radi@isl.utoronto.ca; zonghao.li@isl.utoronto.ca; dhruv.patel@isl.utoronto.ca; tony.chan.carusone @ isl.utoronto.ca).
    Digital Object Identifier 10.1109/TSIPI.2023.3307669

[^1]:    ${ }^{1}$ Source code: https://github.com/ChrisZonghaoLi/optical_receiver_ optimization

[^2]:    ${ }^{2}$ Source code: https://github.com/ChrisZonghaoLi/upenn

